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Amendments to the Claims : 

This listing of claims will replace all prior versions, and listings, of claims in the 

application: 

Listing of Claims : 

1 . (currently amended) A method for crawling documents comprising: 
receiving a uniform resource locator (URL); 

receiving at least two different copies of a document associated with the 
URL; and 

determining whether a web site corresponding to the URL uses session 
identifiers based on a comparison of URLs that are within the document and that 
change between the at least two different copies of the document , where the web 
site is determined to use session identifiers when a portion of the URLs that 
change between the at least two different copies of the document is greater than 
a threshold. 

2. (original) The method of claim 1 , wherein the document is a home 
page of the web site. 

3. (previously presented) The method of claim 1 , further comprising: 
extracting, when the URL corresponds to a web site that uses session 

identifiers, a session identifier from the URL to obtain a clean URL; and 

determining whether the URL has already been crawled based on a 
comparison of the clean URL to a set of clean URLs that represent previously 
crawled URLs. 
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4. (previously presented) The method of claim 3, wherein the 
compared URLs that change include URLs that are local to the web site. 

5. (original) The method of claim 3, wherein the session identifiers 
from the URLs are extracted using rules for the web site. 

6. (original) The method of claim 5, wherein the rules are determined 
automatically. 

7. (original) The method of claim 3, further comprising: 
receiving the URL as a URL from a previously crawled web document. 

8. (original) The method of claim 3, further comprising: 

crawling the URL when the URL is determined to not already have been 
crawled. 

9. (canceled) 

1 0. (currently amended) A method for identifying web sites that use 
session identifiers comprising: 

downloading at least two different copies of at least one document from a 
web site; 
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extracting uniform resource locators (URLs) from the two different copies 
of the web document; 

comparing the extracted URLs of the two different copies of the document; 

and 

determining whether the web site uses session identifiers bas e d on th e 
compar i son when the comparison indicates that at least a portion of the URLs 
change between the two different copies . 

1 1 . (canceled) 

12. (original) The method of claim 10, wherein extracting URLs from 
the two different copies of the document includes extracting only URLs that are 
local to the web site. 

1 3. (original) The method of claim 1 0, wherein the document is a home 
page of the web site. 

14. (original) The method of claim 10, further comprising: 
analyzing the extracted URLs, when the web site is determined to use 

session identifiers, to generate at least one rule identifying how the session 
identifiers are embedded in the URLs. 

15. (original) A device comprising: 
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a spider component configured to crawl web documents associated with at 
least one web site; and 

a session identifier component configured to determine whether the web 
site uses session identifiers based on a comparison of a portion of uniform 
resource locators (URLs) that change between different copies of at least one 
web document downloaded from the web site. 

1 6. (original) The device of claim 1 5, wherein the spider component 
further comprises: 

at least one fetch component configured to download content from a 
network; and 

a content manager configured to extract URLs from the downloaded 
content. 

1 7. (original) The device of claim 1 6, wherein the spider component 
further comprises: 

a URL manager configured to store the extracted URLs. 

1 8. (original) The device of claim 1 5, wherein the at least one web 
document is a home page of the web site. 

19. (original) The device of claim 15, wherein the portion of the URLs 
that change are identified from URLs that are local to the web site. 
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20. (original) The device of claim 1 5, further comprising : 

a session rule generator configured to generate rules describing how the 
web site embeds session identifiers in the at least one web document. 

21 . (currently amended) A device comprising: 

means for downloading at least two different copies of at least one web 
document from a web site; 

means for extracting uniform resource locators (URLs) from the two 
different copies of the web document; 

means for comparing the extracted URLs of the two different copies of the 
web document; and 

means for determining whether the web site uses session identifiers 
bas e d on th o compar i son when the comparison indicates that at least a portion of 
the URLs change between the two different copies . 

22. (canceled) 

23. (original) The device of claim 21 , wherein the means for extracting 
URLs from the two different copies of the web document includes means for 
extracting only URLs that are local to the web site. 

24. (original) The device of claim 21 , wherein the web document is a 
home page of the web site. 
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25. (original) The device of claim 21 , further comprising: 

means for analyzing the extracted URLs, when the web site is determined 
to use session identifiers, to generate rules describing how the session identifiers 
are embedded in the URLs. 

26. (currently amended) A computer-readable medium containing 
programming instructions that when executed by at least one processor cause 
the processor to perform a method for identifying web sites that use session 
identifiers including: 

downloading at least two different copies of at least one document from a 
web site; 

extracting uniform resource locators (URLs) from the two different copies 
of the document; 

comparing the extracted URLs of the two different copies of the web 
document; and 

determining whether the web site uses session identifiers bas e d on th e 
compar i son when the comparison indicates that at least a portion of the URLs 
change between the two different copies . 

27. (canceled) 

28. (original) The computer-readable medium of claim 26, wherein 
extracting URLs from the two different copies of the web document includes 
extracting only URLs that are local to the web site. 
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29. (original) The computer-readable medium of claim 26, wherein the 
web document is a home page of the web site. 

30. (original) The computer-readable medium of claim 26, further 
comprising instructions that cause the at least one processor to: 

analyze the extracted URLs, when the web site is determined to use 
session identifiers, to generate at least one rule describing how the session 
identifiers are embedded in the URLs. 



31. (canceled) 



